Phoneme-level speech and natural language intergration for agglutinative languages

نویسندگان

  • Gary Geunbae Lee
  • Jong-Hyeok Lee
  • Kyunghee Kim
چکیده

A new tightly coupled speech and natural language integration model is presented for a TDNN-based large vocabulary continuous speech recognition system. Unlike the popular n-best techniques developed for integrating mainly HMM-based speech and natural language systems in word level, which is obviously inadequate for the morphologically complex agglutinative languages, our model constructs a spoken language system based on the phoneme-level integration. The TDNN-CYK spoken language architecture is designed and implemented using the TDNN-based diphone recognition module integrated with the table-driven phono-logical/morphological co-analysis. Our integration model provides a seamless integration of speech and natural language for con-nectionist speech recognition systems especially for morphologically complex languages such as Korean. Our experiment resultsdation). We also thank to WonIl Lee for coding the lexicon and the morphological parser and to professor Hong Jeong for his valuable suggestions for the earlier draft of this paper. An extended version of this paper was submitted to the journal of natural language engineering for a review. show that the speaker-dependent continuous Eojeol (word) recognition can be integrated with the morphological analysis with over 80% morphological analysis success rate directly from the speech input for the middle-level vocabularies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Viterbi-based morphological analysis for speech and natural language integration

This paper presents a statistical/symbolic hybrid morphological analysis, called V-morph, for large scale speech and natural language integration for Korean. In the V-morph approach, statistical Viterbi-based lexical decoding and symbolic morphological modeling are integrated together on top of connectionist phoneme recognition engine. Linguistic characteristics of Korean are appropriately cons...

متن کامل

Integrating connectionist, statistical and symbolic approaches for continuous spoken Korean processing

This paper presents a multi-strategic and hybrid approach for large-scale integrated speech and natural language processing, employing connectionist, statistical and symbolic techniques. The developed spoken Korean processing engine (SKOPE) integrates connectionist TDNN-based phoneme recognition technique with statistical Viterbi-based lexical decoding and symbolic morphological/phonological an...

متن کامل

Joint PoS Tagging and Stemming for Agglutinative Languages

The number of word forms in agglutinative languages is theoretically infinite and this variety in word forms introduces sparsity in many natural language processing tasks. Part-of-speech tagging (PoS tagging) is one of these tasks that often suffers from sparsity. In this paper, we present an unsupervised Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and stemming for ag...

متن کامل

A Language-Independent Unsupervised Model for Morphological Segmentation

Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological...

متن کامل

Turkish LVCSR: Database Preparation and Language Modeling for an Agglutinative Language

Turkish language is an agglutinative language. It is possible to produce a very high number of words from the same root with suffixes [1]. Language modeling for agglutinative languages needs to be different than modeling of languages like English. Such languages also have inflections but not as many as an agglutinative language. Techniques which can be used for modeling agglutinative languages ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/cmp-lg/9411013  شماره 

صفحات  -

تاریخ انتشار 1994